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Abstract. In recent years there has been a surge of interest in the statistics of record- 
breaking events in stochastic processes. Along with that, many new and interesting 
applications of the theory of records were discovered and explored. The record statistics 
of uncorrelated random variables sampled from time-dependent distributions was 
studied extensively. The findings were applied in various areas to model and explain 
record-breaking events in observational data. Particularly interesting and fruitful was 
the study of record-breaking temperatures and their connection with global warming, 
but also records in sports, biology and some areas in physics were considered in the last 
years. Similarly, researchers have recently started to understand the record statistics 
of correlated processes such as random walks, which can be helpful to model record 
events in financial time series. This review is an attempt to summarize and evaluate 
the progress that was made in the field of record statistics throughout the last years. 
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1. Why records? 

In our competitive society, we care a lot about performance and we often feel the need 
to outperform others. Maybe this is why, recently, also researchers have become more 
and more interested in records. A record is simply an achievement, a result or some 
other kind of measurement in a given chain of events that exceeds everything that has 
been encountered previously. Therefore a new record is always something remarkable 
which attracts attention, regardless of whether or not the occurrence of this record is 
considered good or bad. Records receive more attention and are remembered longer 
than other measurements because they show the boundary of what has been possible 
so far. In this context, the famous book 'Guinness World Records' holds its own record 
as the best-selling copyrighted book in history pQ. 

An area, where records are certainly of great interest is, of course, sports. 
Particularly in athletics and in swimming records, like Olympic- or world-records, 
are always something special and noteworthy [21 [3]. But also in the context of 
global warming, records have recently become particularly important and interesting 
for climatologist. The question, how a changing climate affects the number of record 
temperatures that we encounter has bothered both the general public and researchers 
HI El El El El El [101 HH H2]- By now it is well established that global warming leads 
to many new heat-records and to a decreased number of new record-breaking cold 
temperatures. 

Records are important also in countless other areas of science. In physics, they 
were discussed in the context of the theory of spin-glasses [131 O EE] an d high- 
temperature superconductors [T3l [T6] , but they also found applications in evolutionary 
biology [TT1 [TH1 [iSl EU] - Curiously, in 2010, the dynamics of ant movements were studied 
using results from the theory of records [21] . Thanks to new theoretical result it was 
recently possible to analyze and model the statistics of records in stock prices [221 1231 121] . 

These data-oriented studies were accompanied and complemented by a substantial 
number of new theoretical results. The classical theory of records in time series of 
independent and identically distributed (i.i.d.) random variables was already developed 
many decades ago [251 EHl 127], but to understand the record statistics of more 
complicated systems like the worlds climate or evolutionary pathways, new techniques 
beyond this standard model were needed. In this context, various processes of 
uncorrelated random numbers sampled from time-dependent distributions were studied. 
Most importantly the Linear Drift Model (LDM), which was introduced already in the 
80 's, where random numbers are drawn from a distribution of unvarying shape but with 
an increasing mean value, was studied extensively [23 [291 ETJl EH [321 [33]. Some authors 
also considered record events from broadening distributions [3H [35] . 

Also connected with problems in the adaptation of theoretical results from record 
statistics on observational data, are the so-called discreteness or rounding effects. Even 
though most of the classical theory is developed for random numbers sampled from 
continuous distributions, practical measurements are always unprecise and rounded 
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to a certain accuracy. Both, the record statistics of random numbers from discrete 
distributions [361 E3 ES [39] , as well as the consequences of analyzing records in time 
series of random numbers that were drawn from continuous distributions and then 
discretizes in a measuring process were discussed in recent years [3D] . 

In 2008, Majumdar and Ziff computed the record statistics of symmetric random 
walks [UJ. Their findings entailed a series of new theoretical results and applications. 
By now, the complicated record statistics of biased random walks and Levy flights 
[221 [23] as well as the one of ensembles of multiple independent random walks [12] is 
well understood. Records in continuous time random walks [13] and also in the distance 
of higher dimensional jump processes from their origin [HJ were studied, similarly. 

The purpose of this work is to summarize and evaluate these recent developments 
mostly from a theoretical point of view, but also with a short evaluation of recent data- 
driven studies in the field. The rest of this review is organized as follows: We will 
start with a brief introduction to the classical theory of records, where we introduce the 
important notation and present some elementary results. Then, in section [3], we describe 
recent developments in the field of record statistics of uncorrelated random variables 
that are sampled from time- dependent distributions. In this context we consider the 
important Linear Drift Model of random variables with a linearly increasing mean value, 
as well as a model of increasing variance. 

In the subsequent section HJ we discuss various alternative models of records in 
continuous and discrete random variables. In particular, we will consider the effects 
of rounding and present generalized concepts like 5- and geometric records, which are 
record events that are only counted if they exceed a certain barrier above or a certain 
multiple of the last record. 

Then, in section [51 we analyze various stochastic processes with correlated entries, 
starting with the symmetric random walk. After discussing the important results 
of Majumdar and Ziff on the symmetric discrete-time random walk [UJ, we will 
demonstrate how these findings can be generalized to biased random walks, to ensembles 
of multiple symmetric random walks and to symmetric continuous time random walks. 

Various important applications that were mentioned above are presented and 
discussed in section |6j We will start by describing the progress made in the study 
of temperature records in 16.11 As an important application of the random walk model 
we outline some recent results about the statistics of record-breaking stock prizes in 
section 16.21 Subsequently, we briefly mention some other applications, for instance in 
physics, biology and in athletics (16.31 and 16. jj) . Afterwards, in section [TJ we give a brief 
summary in which we assess the current state of research in the field of record statistics 
and point out a number of interesting open questions and suggestions for future research. 

2. Classical theory of records 

Let us consider a time series X Q ,Xi, ...,X n of random variables (RV's), which can, for 
instance, be a series of temperatures, stock prices, sports results or some other kind of 
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Figure 1. Sketch of the record process of i.i.d. RVs. The dots represent a time 
series X ,Xi,X 2 , ■■■ of RVs drawn from a continuous distribution / (x) (in this case a 
standard normal distribution). The red (blue dotted) lines illustrate the progressions 
of the upper (lower) record. Here, we find 5 upper and 4 lower records. In both cases 
X n is the first record. 



measurement process. In such a time series an entry X n is an upper record if it exceeds 
all previous entries: 

X n > m&x{X ,X 1 ,...,X n _ 1 }. (1) 

Analogously, a lower record is an entry with X n < min{X ,Xi, ...,X n _i}. In general, 
one defines the first entry Xq as the first (upper and lower) record. The record process 
in the simple case of independent and identically distributed (i.i.d.) RVs is illustrated 
in Fig. [TJ Probably the two most studied quantities in the theory of records are the 
record number R n and the probability P n for a record at time n. This probability P n 
for an upper record is defined as 

P n := Prob [X n > max{X , X u . (2) 

In the following we will also refer to P n as the record rate. The record number i? n is 
simply the number of records that occurred in the time series up to time n. The mean 
record number (Rn), the expected average record number of a stochastic process, can 
by expressed in terms of the record rate: 

n 

(R n ) = Y,Pk- (3) 

k=0 

In the case of i.i.d. RVs sampled from a continuous distribution with probability density 
function (pdf) f(x), one can easily compute the record rate and the mean record 
number: With the so-called stick-shuffling argument one finds that the probability P n 
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for a record at time n in a time series of i.i.d. RV's is given by 

This is just the probability that in a random ordering of n + 1 RV's (sticks) the last 
one (X n ) is the largest. With this result, the mean record number (R n ) in a series of 
i.i.d. RV's takes the form 

n n , 

(Rn) = £ P k = £ ^ = ^> In n + 7 , (5) 

where Hk is the fcth Harmonic number (cf. [15]) and 7 pa 0.577215... the Euler- 
Mascheroni constant |^5l [25] . 

An important feature of record events in i.i.d. RV's is that they are stochastically 
independent. The probability for a record in the nth event is independent from records 
in previous entries [2"5"j I5T] . One can shown that the joint probability P n ,m of records 
both at times n and m factorizes: 

P n ,m. ■= Prob [X n , X m both records] = P n ■ P m (6) 

By now, a lot more is known about the record statistics of i.i.d. RV's. A good 
review can be found in the book by Arnold et al. [23], or Nevzorov [2B] (see also [27]). 
There, quantities like the distributions of record values with a given record number, 
or the interesting waiting-time statistics between individual record events are discussed 
in detail. A noteworthy finding is that the mean time (T Rn ), at which a record with 
record number R n occurs is infinite (see also [46j H7j). Similarly the inter-record times 
A Rn := T Rn - T fln _ 1 have a divergent mean value (A Rn ). 

Furthermore, in the book by Arnold et al. [25], it is shown how to compute the 
probability density function of a record value with a given record number k. Arnold 
argues that due to the so-called lack- of -memory property of the exponential distribution 
with / (x) = e~ x (for x > 0), the pdf fk (x) of such a record value from this distribution 
is given by 

f n (x) = T[k]- 1 x- k e- x , (7) 

where V [k] — (k — 1)! is the Gamma-function [15]. It is easy to show that a record with 
record number k from an exponential distribution is given by the value of the (k — l)th 
record plus an exponential RV sampled from / (x) . Therefore, the pdf of the kth record 
is just the convolution of fk-i (x) and / (x). By iteration this leads eventually to the 
Gamma-distribution in Eq. [71 

This result can be used to compute the distribution of the kth record value in time 
series of RV's from arbitrary continuous distributions. For that purpose one has to 
realize that a RV Aj from any continuous pdf / (x) has the same distribution as 

F- 1 (l-exp(x^)), (8) 

where F~ l (x) is the inverse cumulative of / (x) and X^ exp ^ an exponentially distributed 
RV with pdf e~ x as before. This is easy to see since 1 — exp ^A^ cxp ^ is just a uniform 
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distribution on the interval [0,1), which is the image space of F(x). Using Eq. [S]we 
can infer that 

F (x) = Prob [Xi < x) = Prob F ( 1 — e~ x - J < x 

= Prob [xf xp) < - In (1 - F . (9) 

With this result it is clear how to compute fk (x) in the general case. We just have to 
replace the x in Eq. [7] by — In (1 — F (x)). This leads to 

h (x) = T {k}- 1 (- In (1 - F (x))) k f (x) . (10) 

An important classical result is that the three universality classes of extreme value 
statistics (EVS) [H] that describe the limiting distributions of the maximal value of 
large sets of i.i.d. RV's (for an introduction cf. [US, [50j EI]), are also relevant in record 
statistics. In 1973 Resnick showed that the distribution of a record value with a large 
record number R n — > oo can be rescaled to one of three limiting forms [52] : 

• Negative-log-normal distribution — This distribution corresponds to the Weibull 
class of EVS. Here the negative logarithmic values of the records are normal 
distributed. 

• Normal distributed — Record values in series of RV's from the Gumbel class of 
EVS approach a normal distribution for — > oo. 

• Log-normal distribution — In the Frechet class the logarithms of the record values 
are normal distributed. 

In the following, we will find that the three universality classes of EVS are also 
of importance for the record statistics of time-dependent RV's. Many of the results 
presented in this article will be characterized and discussed in the context of these 
classes. In time series of correlated RV's however, they classes loose their importance 
and one finds different interesting universal characteristics. 



3. Records in uncorrelated and time-dependent RV's 

While the classical theory introduced above deals with identically distributed RV's 
drawn from a single, stationary pdf / (x), one can also consider the more general scenario 
of uncorrelated, but non-identically distributed random numbers Xq,Xi, ...,X n from a 
time series of probability densities fi (a?j). In this general case, it is more complicated 
to compute the record rate P n and the mean record number (Rn). Here, the record rate 
can be obtained from the integral (cf. [3T] ) 

/n— 1 
dx n f n (X n ) Y\ F k (X n ) , (11) 
A;=0 

where Fj. (x n ) = J x dxk fk (xk) is the cumulative distribution function (cdf ) of the pdf 
/fc(xfc). This expression is easy to understand, since it just integrates the probability 
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Figure 2. Sketch of the record process of uncorrclatcd RVs with a linear drift 
(c = 0.1). The dots represent a time series Xo, X\, X2, ... of RVs drawn from a 
series continuous distributions with fk (x) = f (x — ck). The red (blue dotted) lines 
illustrate the progressions of the upper (lower) records. 



that the X n 's entry is equal to x n , while all previous entries Xk, with k < n, are below 
x n , over all possible values of x n . 

In the following we present two possible choices for the series of probability densities 
fk ( x k) that were studied in the literature and that proved to be useful in the analysis 
of observational data. 

3.1. The Linear Drift Model 

The Linear Drift Model (LDM) was first introduced by Ballerini and Resnick in the 
1980's [281 [29] an d later studied by Borovkov [30] and more recently by Franke et al. [31] 
as well as Wergen et al. [32]. The model describes RVs drawn from a distribution that 
retains its shape but has a time-dependent mean value. In particular, the RVs are 
sampled from a series of pdf's fk (xk) = f (xk — ck), where c is a real constant, which is 
called the drift. The entries of such a time series are of the form 

X k = Y k + ck, (12) 

where Y , Y ly Y n is a time series of i.i.d. RVs sample from / (x). The record process of 
a series of RVs from the LDM is illustrated in Fig. |2j Here, the general, time-dependent 
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expression for the record rate (Eq. ITTT) takes the following form: 



71-1 



/lb _L 
dx / (x — cn) J^J F (x — ck) 

k=0 

/n 
dx f (x) Y[ F (x + ck) (13) 
k=i 

Ballerini and Resnick (28] could prove that this record rate has an asymptotically 
constant limiting value P (c) = lim^oo P n (c) if the pdf / (x) has a finite first moment 
J dx xf (x) < oo. To determine the behavior of P n (c) in more detail, is however a 
difficult problem, which, in general, can not be solved exactly. 

There is an example of a pdf, for which the record rate P n (c) in the LDM can be 
calculated for arbitrary n: For the Gumbel distribution with the probability density 
/ (x) = e~ x exp (e~ x ), Franke et al. [31] found that 

^ Gumbcl (c) = (14) 

Similarly, it is possible to compute the asymptotic record rate for the exponential 
distribution with / (x) = ve~ vx (with x > and v > 0). In this case, the record rate 
P n (c) assumes the following form: 



P n (c) = dx ve~ vx TT (1 - e ~ v{x+ck) ) 



k= 



i - y 

where (a, q) n is the q-Pochhammer symbol with (a, q) n := Ylk=o — a( l k ) @S]- With 
this we can expand the asymptotic record rate P (c) in powers of e~ cv and find 



/ 1 dy 



/oo 



-y 

w 1 - -e~ cu - -e~ 2cu - -e~ 3cu - -e~ icu + O (e~ bcv ) . (16) 
2 2 6 6 v ; v ; 

By using computer algebra software, such as Mathematica, it is possible to compute 

arbitrary higher order terms of this expansion. However, we found that, in comparison 

with numerical simulations of P(c), the expansion up to the 4th order in Eq. [16] is 

already very accurate and fails only for small c — > [53]. For this case, one can compute 

the record rate P n (c) by a different approach. Replacing the product in Eq. [15] by the 

exponential of a sum of logarithms leads to 

f°° ( " 

P (c) = / dx z/e- ra exp I -ux - V In (l - e - u{x ~ ck) ) 
Jo V 



k=l 



e -vx 



CP 



dx exp (1 - e~ cun ) , (17) 
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where, for the second step, we replaced the sum by an integral assuming that n ^> 1 
and cv <C 1. With this we obtain the small c behavior of P n (c) for the exponential case: 

P n (c) « )• cu [1 — e 



which, for small cr/ -C 1, approaches cr/. Apparently, for small c, the record rate of the 
exponential distribution depends linearly on c. Comparing with numerical simulations, 
we found that Eq. [18] is computes P n (c) accurately for cv <\ [53] . 

In the article by Franke et al. [31], the record rate for a more general set of 
continuous probability distributions was computed in two different regimes. For small c, 
Franke et al. derived approximate results for finite values of n in the regime of nc <C cr, 
where a is usually the standard deviation or some other measure of the width of / (x). 
The same can be done in the opposite regime of c — > oo. It turns out that the behavior 
of P n (c) depends systematically on the three classes of EVS. Franke et al. [31] discussed 
their findings in the context of these classes. 

3.1.1. The regime of small cn In the small c regime, it is possible to expand the record 
rate P n (c) into powers of c. Expansion up to the first order yields 

/n 
dxf(x) ]jF(x + ck) 

/n 
dx f (x) Y[ (F (x) + ckf (x)) 
k=i 

w J dx f (x) F n (x) + %i (n + 1) J dx f (x) F n ~ l (x) . (19) 

The first summand in the last line is the stationary record rate with P n (c = 0) = 1/n. 
With 

In := J dx f 2 (x) F n ~ l (x) (20) 

this leads to 

Pn(c) w — — - + ^n(n+l)I n . (21) 
n + 1 2 

This expansion is accurate if the underlying distribution / (x) varies only slowly between 
x and x + cn. For many probability densities this can be translated into cn ^ a (with 
a 2 := J dx x 2 f (x)). 

In [31], \ n was computed for several representative distributions from the three 
classes of extreme value statistics. Here, we want to derive l n and P n (c) for a Generalized 
Pareto Distribution (GPD). We consider RV's from the cdf 

f 1 - (l + £r)~* , for £ + 
(l-e- x , fore = 0. 

^ G M is the shape parameter of F [x] . For ^ > this distribution has an infinite support 
and is defined for x > 0, for £ < is it defined on the finite interval x £ [1, 1 — 
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Depending on £, the GPD can be in all three classes of EVS. For £ < 0, F% (x) is 
in the Weibull class, for £ = in the Gumbel class and for £ > in the Frechet class. 

For this distribution, the integral I n can be evaluated by elementary means and we 
find that 

r W n,2 + fl, for^O 
|_l/(n(n + l)), far | = 0, 

where B(x,y) = Y [x] Y [y] /Y [x + y] is the Beta- function [15]. Using Stirling's 
approximation [45], we find that for n 3> 1, the record rate of the GPD with a drift 
c <C n^" 1 is given by 

n+1 \i, for£ = 0. 

This result summarizes how a small linear drift affects the record rates depending on the 
extreme value class of the underlying distribution. Although this is no conclusive proof, 
we conjecture that the effect of the drift generally increases with n for distributions 
of the Weibull class. In the Frechet class the effect decays with n and the drift is 
asymptotically negligible. The Gumbel class is intermediate between these two cases. 
Interestingly, since for £ > 1, n^ 1 grows with n, some of the results for the Frechet class 
(for £ > 1) are also correct in the asymptotic limit with n — » oo. 

To better understand the behavior of P n (c) in the Gumbel class, Franke et al. [31] 
considered the Generalized Gaussian Distribution (GGD) with 

f{x) = 2Y [1 + /3- 1 ]" 1 e ~ lxl * ( 25 ) 

with /3 > 0. They could show that, here, I n grows logarithmically with n: 

I n ocln(n) 1_ K (26) 

This expression includes the important case of a Gaussian distribution for f3 = 2. For 
the Gaussian probability density / (x) = -^ e ~ x2 ^ 2a2 one obtains 

(27) 

3.1.2. Correlations in the Linear Drift Model An interesting subtlety that was 
discovered in the study of the LDM is the fact that record events in this process are not 
stochastically independent as in the i.i.d. case. In particular, it was found by Wergen et 
al. [32] that the probability P n , m (c) of records in the entries n and m in a series of RV's 
with a linear drift can differ from the product of the record rates P n (c) and P m (c). In 
[32] . the probability P„ in+ i (c) of having two consecutive records was studied in detail. 
They showed that, depending on the choice of the underlying distribution, P n>n +\ (c) 
can be both, smaller and larger than P n (c) • P n +i (c). Therefore, the probability for a 
second record in step n + 1 after a record in step n can be both increased and decreased 
with respect to the unconditional probability P n+ i (c). 
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Wergen et al. [32] denned the ratio 

kn+1 ^ Pnl^-Pniicy (28) 

which is always given by l n ,n+i (c = 0) = 1 in the i.i.d. case. In the regime of small c 
and n ^> 1, we can again use the GPD (Eq. 122]) to illustrate the asymptotic behavior 
of this quantity. Expanding Z njn+ i (c) up to first order in c with the same method as 
before, we find that 

r 29 
for £ = 0, 

which is again valid for c <C 1 . Apparently, the exponential distribution with 
/ (x) = e~ x (£ = 0) plays an outstanding role. For the distributions of the Weibull 
class with £ < 0, the inter-record correlations are negative for a positive drift c > 0, 
for the representatives of the Frechet class (£ > 0), l n ,n-\ (c > 0) is larger than one and 
grows with n. Only in the exponential case, l n ,n+i (c) assumes an n-independent value 
slightly above unity. 

Again, the intermediate Gumbel regime can be studied more systematically using 
the GGD with / (x) oc e~' x ' /3 as before. In a lengthy calculation, Wergen et al. [32] 
showed that, here, for n>l, the ratio l n , n +i (c) behaves like 

Z n „ n+1 (c) « 1 - cnA (\ - jj In (n) 1 "^ . (30) 

with a positive constant A, which depends on n. Apparently, for j3 < 1, stretched 
exponential distributions, broader than the exponential have positive correlations that 
grow logarithmically with n. Distributions decaying faster than the exponential (/3 > 1) 
lead to negative correlations. 

Even though it is not clear how two explain the emergence of these correlations and, 
in particular, why it is possible that records occur more frequently after a preceding 
record, these effects turn out to be useful. Since it is a well known problem for 
experimentalist to decide whether or not a series of measurements is drawn from an 
underlying distribution with so-called heavy-tails (see for instance [5H [55] [56]). Franke 
et al. [33] proposed a test that uses the findings presented in [32] in this matter. 

For this test a set of measurements Xq, Xi, X n has to be shuffled randomly before 
adding an artificial linear drift. For a random permutation 7r , tt\..., ir n of 0, 1, n such 
a shuffled and artificially drifted set of data is given by 

X no ,X ni + c, Xn 2 + 2c, ...,X Vn + nc. (31) 

Now one can analyze the inter-record correlations in this time series. In particular, one 
has to compute the ratio l n ,n+i (c). As shown in [33], the statistics can be improved 
significantly by averaging over many different random permutations. If the correlations 
in the drifted time series are positive (l n ,n+i ( c ) > 1) this is a good indicator for 
measurements from a distribution, which is at least broader than the exponential one. 
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Figure 3. Sketch of the record process of uncorrelated (Gaussian) RVs from a 
broadening distribution. The sticks represent a time series Xq,X\,X2, ... of RVs 
drawn from a series of continuous and symmetric distributions fk{x) = f {xk~ a ) 
(Here: a = 1.1). The red (blue dotted) lines illustrate the progression of the upper 
(lower) record. 



Franke et al. [33] demonstrated that this record-based test allows to detect these heavy- 
tail properties already in very small data-sets with less than 64 data-points. In this 
context, the test might be better than standard methods like, for instance, maximum 
likelihood estimators, which are commonly used for problems of this type. However, a 
thorough comparison of this new test with the existing ones has not been performed 
yet. 

3.2. The Increasing Variance Model 

In 1975, Yang [57] introduced a model of growing populations to explain the increased 
record rate in sports due to a growing number of athletes that tries to break records. 
Yang showed that any exponentially growing population of athletes leads to an 
asymptotically constant record rate lim^oo P n > 0. 

Building up on this model, Krug considered random variables Xq,X±, ...,X n from 
a series of probability densities (x^) with a time-dependent width: 

fk Ofc) = hf (Afcs) . (32) 

In particular, Krug discussed distributions with a power-law time-dependence and 

A fc = k~ a (33) 

Such a process is illustrated in Fig. [3] for an a slightly larger than one. Apparently, the 
distribution broadens for a > and gets sharper when a < 0. Here, the record rate 
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P n (a) takes the following form: 



dx f (xn~ a ) Yl k~ a F (xk~ a ) 

7, — n 



k=0 

n-1 



k 



dx f (x)l[F \x i^j j (34) 

Krug [M] computed the asymptotic behavior of the record rate P n (a) and the mean 
record number (R n (a)) for this model in the context of the three universality classes 
of EVS. The effect of a broadening distribution with a > 1 is similar to the effect of a 
positive drift in the LDM. For distributions of the Frechet class, the broadening width 
does not systematically change the large n behavior of the record rate. At the same 
time, it has the strongest effects in the Weibull class. 

Using the findings of Krug [34J, we can calculate the asymptotic behavior of the 
record rate P n (a) for the GPD F% (x), which was introduced in section I3TT1 For a large 
n — > oo and a > 1 we obtain 

{ap-t^n-t 1 -®' 1 , for£<0 (Weibull class) 

^ for f = (Exp. distribution) (35) 

^ for £ > (Frechet class) 
For the mean record number, this leads to 

{c^-^V^ 1 -^ 1 , for£<0 (Weibull class) 
(ln(n)) 2 for£ = (Exp. distribution) (36) 

In (n) for £ > (Frechet class) 

For a Gaussian distribution, the asymptotic results only differ in the prefactors 
from the exponential case. 

Krug also studied the correlations between the record events in this model in a 
numerical manner. In contrast to the LDM, he found only negative correlations between 
records from RV's with an increasing variance. As in the case of the LDM, it is still 
controversial how to explain these correlations comprehensibly. 



4. Discreteness, rounding and ties 

The theoretical results in the previous chapter were all derived for RV's from entirely 
continuous distributions. In the context of experimental measurements and their record 
statistics, but also for purely mathematical reasons, one can also study models with 
discrete RV's with respect to records. In this case the statistics of records is more 
complicated and, in principle, there are several different approaches to this problem. 

On the one hand, it is possible to consider the record statistics of RV's from 
distributions which are inherently discrete. Two prominent examples are 

• the discrete uniform distribution with equally likely probabilities for a finite number 
of RV's: P[X — k] — 1/N (with N e N and k = 1, N), 
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• and the geometric distribution with P [X = k] = (1 — p) k 1 p (with p £ [0, 1] and 
fc £ N.) 

In the case of a discrete distribution, it is possible that a record value, for instance 
an entry Xi, gets tied by a succeeding RV X,- = X^ This is impossible for RV's 
sampled from a continuous pdf. In the case of a tie, one has to decide whether or not 
one wants to count this tie as a new record. In the literature about record statistics 
from discrete distributions, records without ties are usually called strong records, while 
records including ties are called weak records. 

For the discrete uniform distribution, it is very easy to compute the strong record 
rate P n , which is given by the following sum: 

^eMW" (37) 

k=l v 7 

For n — > oo this behaves like P n ~ N' 1 (1 — iV -1 )* 1 , which leads, of course, to a finite 
mean record number (R n ) oc N. For the weak record rate p n the situation is different 
and one finds that the asymptotic record rate is given by p n ~ iV" 1 , which leads to a 
divergent weak mean record number of (r n ) ~ n/N. 

The case of the geometric distribution is already much more complicated and 
was considered by Prodinger in 1996 [37] (see also Vervaat [36J). He derived the 
asymptotic mean record number (Rn) in the strong case for the geometric distribution 
with P [X = k] = (1 —p) k ~ 1 p. In a rather complicated computation, he showed that for 
n — » oo: 



Wa, M(i- P) -) ( to "^-gW f ) + f < 38 > 

With an imaginary constant a p = (2kiri) / In (1 — p)~ . The occurrence of the oscillatory 
term in this expression is quite surprising and, to our knowledge, it is difficult to explain 
this effect intuitively. 

Apart from that, it is also interesting to consider discreteness effects in RV's from 
continuous distributions. While in our considerations in sections |2] and [3] a record entry 
X n was simply a value that was larger than all previous values X , Xi, ...,X n _i, one can 
also impose different, more complicated, conditions, where records are only counted if 
they exceed another barrier depending on X ,Xi, ...,X n _ 1 . Some important examples 
that have been studied in the literature are the following: 

• Rounded records: An entry X n is a strong record if the rounded value LXtJa 
exceeds the maximum of all previous entries: 

LA n J A > max{LX J A , L^iJa,-, L^Ja}- (39) 
Here, |_"J a means rounding (up or down) to the next integer multiple of k ■ A with 
k £ Z. Similarly, we have a weak record if 

LXJ A > max{LA J A , LXJa,-, |X»Ja}. (40) 
This process is illustrated for exponential RV's in Fig. H] (top). 
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Figure 4. Top: Sketch of the record process of (exponential) i.i.d. RVs, which are 
rounded down to the next integer. A continuous RV Xk (dots) is a new record if \Xk\ 
exceeds all previous values. The progression of the (upper) record value is given by the 
red line. Bottom: Sketch of the 5-record process of (exponential) i.i.d. RVs for 5 = 1. 
An entry X/~ is counted as a new 6- record, if it is larger than max{Ai, Xk-i} + 5. 
Again, the progression of the (upper) record value is given by the red line. 



• 5-records: A 5-record is an entry X n that exceeds all previous entries 
X ,Xi, ...,X n _i at least by 5: 

X n > max{X + 5, X x + 5, X n _ x + 5}. (41) 

Note that 5 can, in principle, also be negative. Such a record is called a strong 
record for 6 > and a weak record if 5 < 0. This record model is sketched in Fig. 
H] (bottom) for 6=1 and RVs from an exponential distribution. 
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• Geometric records: For a geometric record, an entry X n has to exceed a fixed 
multiple of all previous entries: 

X n > m&x{aXo, aXi, ...,al n _i}, (42) 

where a > is an arbitrary constant. Here, a record is a strong record if a > 1 
and a weak record for a < 1. 

In the following, we summarize how the record statistics in these three cases differ 
from the continuous case. As in the above, the three universality classes of EVS will 
play an important role. In all three cases, the findings will differ systematically between 
these classes. 



4-1. Rounding effects 

The record statistics of rounded measurements where first considered by Wergen et al. 
in 2012 [3D]. In a previous study, they analyzed historical temperature measurements 
from U.S. weather stations [HE] that were recorded in whole degrees of Fahrenheit. 
They found that this discreteness had a significant effect on the record statistics of the 
temperature data that could, in principle, disguise a possible effect of global warming 
on the occurrence of record-breaking events [9J, I3DJ . The problem is more general: In all 
applications, experimental measurements can only be recorded up to a certain accuracy. 
Usually, one has to deal with RV's, which are sampled from a hypothetical continuous 
distribution and then discretized by rounding in the measurement process. 

In 2012, Wergen et al. [30], studied the strong record rate and the mean record 
number of i.i.d. RV's for a continuous pdf / (x) that were rounded down to integer 
multiples of a discretization length A. They showed that the strong record rate P^, in 
this case, can by computed from the following sum: 

Pn = E i F ((* + ^ A ) - F ( fcA )] Fn ~ l ( fcA ) • ( 43 ) 
k 

This is just the sum over the probabilities for a new record at time n on the individual 
lattice sites kA (with k E Z). For A — > 0, it is easy to show that P^ approaches the 
continuous result with P n = J dxf (x) F 11 " 1 (x) as in section [2j 

In the limit of n — > oo, it is possible to analyze the asymptotic behavior of Eq. [43] 
with respect to the universality classes of EVS. For that purpose, we can again use the 
GPD (Eq. |22"1) . Since interesting results can only be expected for a discretization length 
A much smaller than the support of the distribution, we can approach the problem 
by replacing the sum in Eq. [43] by an integral. In this case, however, the bounds of 
integration have to be chosen carefully. Then, the strong record rate for the GPD is 
given by 

pA ^ilt 1 dx[F((k + l)A)-F(kA)]F-HkA), for£<0 

" ~ \l i°° dx l F (0 + 1) A) - F (JfeA)] F 11 - 1 (kA) , for f > 

Here, the upper bound for the Weibull class 4 — 1 is simply the finite number of lattice 
sites in [1, 1 — minus one. In the case of weak records one has to omit the minus 
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one. With Eq. [44J we can compute the large n limit of P A and find that 

^e~ nA ~\ for£<0 (Weibull class) 

P * 1^ \ iS i 1 ~ e_A ) for £ = ( Ex P- distribution) (45) 
- for £ > (Frechet class) 

Apparently, the record rate changes systematically in the Weibull class, while, in the 
Frechet class, the asymptotic behavior is as in the continuous case. The corresponding 
results for the weak record rate p A are the following: 

{(1 + f (1 - A))"? , for f < (Weibull class) 
-ir (e A + e~ A ) for £ = (Exp. distribution) (46) 
- for £ > (Frechet class) 

Here, the asymptotic weak record rate in the Weibull class is constant and equals the 
probability that a RV falls into the largest lattice site. Both cases show that rounding 
effects are important for RV's from the Weibull class, while they are negligible in 
Frechet class. The behavior in the Gumbel class is, as usual, intermediate between 
those two. While the (strong and weak) record rates of the exponential distribution 
are still proportional to 1/n, Wergen et al. [40] found sublinear corrections to the 1/n- 
behavior for the GGD with / (x) oc e~^' 3 . Here, for > 1, the strong and weak record 
rates decay as 

1 1 1 1 

P A oc -Inn" 1 and p A oc - Inn 1- ?. (47) 

n n 

Note that, even though these results were derived for the special case of 'rounding down', 

they do not change systematically if one considers other kinds of rounding like 'rounding 

up' or 'rounding to the nearest integer'. 

Wergen et al. [40] also considered the interesting regime of very strong discreteness 

with A ^> 1. Here, the occurrence of records becomes predictable on a logarithmic 

time-scale for certain distributions from the Gumbel class. For a detailed discussion of 

this phenomenon we refer the reader to |40j . 



4-2. 5 -records 

The concept of <5-records (or near-records) was discussed by various authors, for instance 
by Gouet et al. [38j EHJ EH] or Balakrishnan et al. [601 ET] - In particular Gouet et al. 
made important progress on this problem. In [55] . they discussed the process of 5- 
records in detail and proved a limit theorem for the asymptotic distribution of the 
record number in this case. Instead of describing their rather complex derivations, we 
will now demonstrate an elementary approach that illustrates the asymptotic behavior 
of (^-records in time series of RV's from the three classes of EVS in the regime of small 
6^1. Our findings are in good agreement with the results of Gouet et al. [55] . 
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In the general case, it is easy to see that the record rate of 5-records can be obtained 
from the integral 

P 5 n = J dxf(x)F n - 1 (x-S). (48) 

Again, for 5 = 0, we obtain the continuous result with P n = 1/n. As in the case of the 
LDM (see section [3]) we can now expand this integral for small values of 5 and n ^> 1. 
Doing this we find 

P,l « J dx f (x) (F 71 - 1 (x) -5(n-l)f (x) F n ~ 2 (x)) 

« - 5nl n . (49) 

n + 1 

with the same l n — J dx f 2 (x) F n ~ 2 (x) as in section [31 With the results for l n described 
in that section, we can now compute the rate of 5-records for RV's from a GPD in the 
regime of small 5. Here, we find that 

For £ > (Exponential distribution and Frechet class) this result is correct even in 
the limit n — > oo. In the Weibull class with £ < it holds for if <C n^. The result 
illustrates nicely how the 5 affects the record rate in the 5 < 1 regime. As in the case 
of rounding discussed before, the 5 is negligible in the Frechet class and has a strong 
effect that increases with n in the Weibull class. It is straightforward to show that, in 
the Weibull class, the record rate will eventually decay exponentially, which leads to a 
finite asymptotic record number [38| [39]. 

Again, the case of the Gumbel class is more complicated. For the GGD with 
/ (x) oc e~^ P one finds a that, for small S <^ 1 

-^-ln(n) 1- ^. (51) 



n n + 1 p n 

With a positive constant Ap, which depends on the tail parameter /3. While, for /3 < 1, 
this approximation is valid for arbitrary values of n, it only holds for ln (n) 1 ^ <^ 
when (3 is larger than one. This result indicates that the marginal case of (3 = 1 plays 
an important role. 

In fact, it is well known, that the mean spacings (Aft) between the subsequent 
records with record numbers k and k+1 from an exponential distribution are equidistant 
from each other [25]. For all (light-tailed) distributions decaying faster than the 
exponential, e.g. with /3 > 1, these mean spacings are decreasing with increasing k 
and the record values move closer and closer together. For (3 < 1 and (heavy-tailed) 
distributions broader than the exponential, the spacings increase with k. Only in the 
regime of f3 > 1, the spacings will eventually become smaller than any 5. Eventually, as 
shown rigorously by Gouet et al. [59], for very large n, this leads to a slow exponential 
decay of the record rate for all distributions with /3 > 1. Using the results of Gouet et 
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al. [SH] one can compute the (exact) asymptotic mean record number for the GGD in 
the large n limit: 



for > 1 



503-1) 

(#«) <j In (n) e" 5 , for = 1 (52) 

In (n) , for /3 < 1 

with positive constants and depending on (3. 

4-3. Geometric records 

The first author who discussed the problem of geometric records was Eliazar in 2005 
[62] . A geometric record is a record that is only counted if it exceeds a certain multiple 
of the previous record. In particular, in order to be a record, X n has to be larger than 
a ■ max{Xo, Xi..., X n } with a positive constant a (not necessarily larger than one). The 
record rate P° for this problem is given by 

K = J dxf(x)F^Y~\ (53) 

For the exponential distribution with / (x) (x > 0), this integral can be computed 
exactly. Here we find 

dxe x (l-e -) y — ^, (54) 

which reproduces the i.i.d. record rate for a = 1. Interestingly, for a > 1, this indicates 
a finite asymptotic mean record number. For the GPD with £ ^ 0, the situation is more 
complicated and we can not compute the record rate P° directly for other representatives 
of the distribution. In a recent article, Gouet et al. [SH] proved a series of theorems for 
the record rate of geometric records that can be used to give the asymptotic record rate 
of the GPD in the geometric case. 

Gouet et al. showed that the record rate P" of distributions of the Frechet class 
does not differ significantly from the i.i.d. case. In the Weibull class, on the other hand, 
the asymptotic record rate goes to zero for a > 1. In the Gumbel class, the situation 
is more complicated and, for a > 1, the mean record number can be both finite or 
divergent. Interestingly, in this class, one also finds distributions, were the mean record 
number goes to infinity with a slower than logarithmic speed. With the results presented 
in [59], we can infer the record rate P° for the GPD for n — y oo and £ > 0: 

ar [a] n~ a , for £ = (Exp. distribution) 
a'^n" 1 , for £ > (Frechet class) 

In the Weibull class, we expect an exponential decay of P°, but, by now, we are not 
aware of any analytical results for this regime. 



o 
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Figure 5. Sketch of a symmetric random walk with a Gaussian jump distribution. 
The red (blue dotted) lines mark the progression of the upper (lower) record of the 
process. 



5. Records in correlated processes 

5.1. Records in symmetric, discrete-time random walks 

An entirely new field of research was established through the work of Majumdar and Ziff 
|41j . who were the first to consider the record statistics of symmetric random walks. In 
contrast to most of the previous research in the field of record statistics, they considered 
a correlated process, namely a symmetric, discrete-time random walk (an introduction 
can be found in [63j El]), and computed its record rate, its mean record number and 
also the full distribution of the record number. In the following, we will summarize and 
discuss their important findings. 

A discrete-time random walk (DTRW) X ,Xi, ...,X n is a time series with entries 
of the form 



with i.i.d. increments rji drawn from a continuous and symmetric distribution / (rj) (see 
also Fig. [5]). Without loss of generality, we can set X = 0. Then, by definition, X Q = 
is also the first record. 

To compute the record statistics of this process, it is helpful to introduce two 
generally important quantities, the first-passage probability <ft (x, n) and the survival 
probability q(x,n) (cf. [65]). The (positive) first-passage probability is the probability 
that a random walk, starting at 0, crosses x > in time-step n for the first time: 



Xi = Xi-\ + rji 



(56) 



4> (x, n) := Prob [X n > x & X , A 1; X n _i < x] . 



(57) 
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The related (positive) survival probability q (x, n) is the probability that the random 
walk remains below x for the first n steps: 

q (x, n) := Prob [X , X u X n < x] . (58) 

It is easy to see that the first-passage probability can also be obtained by <p (x, n) = 
q (x, n — 1) — q (x, n). 

In the special case of a symmetric DTRW (/ (rj) = f (—rf)) and x — 0, these 
quantities can be computed using an important theorem by Sparre Andersen [66], [67] . 
He showed that, in this case, the generating function of the survival probability q (0, n), 
defined as q (0, z) = Xl^Lo 1 n ) 2 " * s gi ven by 

q(0,z) = -jl=. (59) 

Expanding in powers of z this leads to q (0, n) = ( 2 ^) 2 _2n . 

This result was the most important requirement for the work of Majumdar and Ziff 
pTl] . They showed that a random walk of length n with R n records can be described as 
a chain of R n — 1 first-passage and one survival problem. This is possible because of the 
so-called renewal property of the random walk. After a record at time i, the probability 
for a record at time i + j is the same as the probability <ft (0, j) that a random walk 
starting from crosses the origin (from negative to positive) after j steps for the first 
time. As long as the process stays below the origin set by the record at time i, no further 
records occur. 

Therefore, the probability P (ij, in n ; n) for a random walk with records at times 
0,22,23, ...,iR n (with ii = by definition and < i 2 < iz < ... < in n < n) can be given 
by 

P(h, ■■.,i Rn ;n) = 

0(0, i 2 ) ■ (p(0,i 3 -i 2 ) ■ ... -(j)(0,iR n -iR n ^) -q(0,n-i Rn ) (60) 
With this, the distribution P (R n \n) of the record number R n can be obtained by 
summing over all possible sets of inter-record times 0, i 2 , «3, iR n with < i 2 < ... < 
^Rn < n>- The easiest way to compute this sum is via the generating function of P {R n \n). 
Majumdar and Ziff found that P (R n \n) obeys 

oo _R — 1 

P(Rn\n)z n = (${z)) q{z) 

n=R„-l 

= {\-{l-z)q{z)) R -- 1 q{z) (61) 



and, with the survival probability q (z) = y/1 — z of the symmetric random walk, one 
finds 

E P(Rn\n)z-= (±Z^Z*£- - . (62) 
p 1 y/l — z 

n=R n -l 

This result allowed Majumdar and Ziff to extract the exact distribution of the record 
number R n : 

P(R n \n) = ( 2U ~^ n + 1 ) 2- 2n+R ^\ (63) 
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From this expression, one can easily obtain the mean record number (R n ) and the 
record rate P n of the symmetric DTRW. For the generating function of (R n ), one has 
to multiply Eq. ED with the record number R n and sum over all possible values for R n . 
This leads to 

oo oo _R — 1 1 

J2(Rn)z n = ^ [4> (*)) q (Z) = — =3 • (64) 

n=0 R n =0 V 1 Z 

Expanding this result in powers of z we find 

(Rn) = (2n + 1) 2~ 2n and P n = T 2n . (65) 

Is is interesting to analyze the asymptotic behavior of these quantities in the limit of 
n — > oo. Here, the record number approaches a half-Gaussian distribution with 

P(Rn\n) « -Le"^. (66) 



The mean record number and the record rate converge to 




and P n ~ (67) 

^/v^n 



Majumdar and Ziff also considered discrete random walks on a lattice with lattice 
constant d and a jump distribution f (x) = ^(5 (x — d) + 5 (x + d)). In this case, the 
asymptotic record statistics is very similar to the continuous case and differs only in a 
prefactor. For n — > oo, the mean record number and the record rate are reduced by a 
factor of 1 / a/2: 

(R n )^J— and P nRi ^ = . (68) 

In the article by Majumdar and Ziff [H] one can also find a discussion of the extreme 
value statistics of the ages of the longest and shortest lasting records in a symmetric 
random walk. In particular, they showed that the expected age of the longest lasting 
record grows proportional to the walk length n and not to y/n as the average age of a 
record and also the age of the shortest lasting record. 



5.2. Biased random walks 

A natural way to generalize the model of a symmetric DTRW considered by Majumdar 
and Ziff, is to introduce a bias. The entries Xq, X\, ..,X n of such a biased random walk 
with a constant drift c are given by 

X t = AVx + rn + c, (69) 

where the ^'s are i.i.d. RV's from a symmetric distribution / (77) as in the previous 
section and again X = 0. As in the case of the LDM for uncorrelated time series, the 
drift causes a non-universal and distribution dependent behavior of the record statistics 
of the DTRW. The simple, universal version of the Sparre Andersen theorem is not valid 
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in the biased case and the first-passage and survival probabilities of the random walk 
depend on the choice of the jump distribution / (77). 

Fortunately, there exists a more general version of Sparre Andersen's theorem that 
holds also in the biased case. Sparre Andersen [66] EZ] showed that the (positive) 
survival probability with respect to the origin of the biased random walk q c (0, n) has 
the generating function 

X^PcWj, (70) 

where p c {n) is the probability that a random walk is negative at time n: p c {n) : = 
P [X n < 0]. In the case of an unbiased random walk with c = 0, we have po (n) = |, 
which reduces Eq. [70] to the simple symmetric version of the Sparre Andersen theorem 
introduced in the previous section (Eq. EH]). In the general case, it is more complicated 
to compute p c (n) and the probability density of the random walk at time n is needed. 

Majumdar et al. [23J found that the asymptotic behavior of q c (0, n) depends 
crucially on the tail of the jump distribution / (77). Since, the behavior of the distribution 
/ (77) for large values of rj is dictated by the small k behavior of its Fourier transform 
/ (k), Majumdar et al. considered jump distributions with Fourier representations of 
the form 

fwni+iwr ( 71 ) 

for small values of k. Here, Z M is a constant parameter and p G (0, 2] the so-called Levy- 
index (also the tail-index) of the jump distribution. While a Levy-index with p = 2 
corresponds to jump-distributions with a finite second moment a 2 = J dx x 2 f(x), 
a value of p < 2 describes a distribution with heavy-tails, whose variance does not 
converge. In real-space, the tails of such a distribution decay as / (x) oc |x| _At_1 for 
x — > 00. Distributions with p < 1 have even a divergent mean value j dx xf (x). 

For the simple form of / (A;) in Eq. [71] one can compute p c (n) by elementary 
means. Obtaining the generating function q (0, z) in a closed form is, however, far more 
complicated and requires more sophisticated techniques. 

Without going into the details of the computations of Majumdar et al. [23], we will 
now summarize their results for the survival probability q c (0, n), the distribution of the 
record number P(R n \n), the mean record number (R n ) and the record rate P n of the 
biased random walk. In fact, they found five different universal regimes depending on 
the bias c and the Levy-index p, in which the asymptotic survival and record statistics 
have systematically different characteristics. These regimes are the following: 

/ — The subcritical case (p < 1): In this regime, the standard deviation of the position 
of the random walker X n grows faster than linear and the effects of the drift are therefore 
negligible in the large n limit. In fact, the survival probability q c (0,n) is proportional 
to \j \fn as in the unbiased case. The mean record number and the distribution of the 
record number have the same large n behavior as the symmetric random walk, only 
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their prefactors are different. Also the extremal ages of the shortest and longest lasting 
records have the same large n asymptotics as in the symmetric case. 

II - - The marginal case (\i — 1): In this interesting regime, the survival and record 
statistics have a more complicated dependence on n. In some sense, this is the regime, 
were the drift and the fluctuations of the process are of the same order. Here, the 
survival probability q c (0, n) decays like 

ScMoc^, (72) 

with a c dependent exponent G (c) = \ + ^arctan(c). This non-trivial exponent also 
appears in the mean record number. Majumdar et al. [23] showed that 

(R n )ocn e ^ and P n oc n 6 ^" 1 . (73) 

Also the full distribution of the record number P (R n \n) and the extremal ages of the 
shortest and longest lasting records depend on this function (c). In both cases the 
asymptotic results are more complicated and we refer to [23] for details. 

A jump distribution, which falls into this regime, is the Cauchy distribution with 
/ (x) = n ^ x iy This special case was already considered by Le Doussal and Wiese [68] . 
prior to the work of Majumdar et al.. They also found the non-trivial exponent G (c) 
and computed the exact mean record number, as well as its variance, for this case. For 
a biased random walk with a Cauchy jump distribution the mean record number reads 

{Rn) = r[n + i]r[2-e(c)]' (74) 

Interestingly, the function G (x) is also the cumulative distribution f x dx f (x) of the 
Cauchy distribution. The reason for this agreement is, to our knowledge, unclear. 

1/7 - - The supercritical case with positive drift (fi > 1 & c > 0): Here, the survival 
probability decays faster than in the two previous cases. For n —> oo, q c (0, n) behaves 
like 

g c (0,n)oc— (75) 

and the mean record number grows linearly with n: 

(R n ) « (c) n (76) 

with a parameter a M (c), which was also computed by Majumdar et al. [23]. The 
distribution of the record number has an interesting, non-trivial scaling form. In 
particular, P(R n ,n) is given by 

p, D \ 1 T , ( R n -a^(c)n \ 

P (R n , n) — — > — —V^ i (77) 

The scaling function (u) is of the form 

V M (u) w c 1 M^e" C2 ' 1 ^ (78) 
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with constants C\ and c 2 , which depend only on [i. Here, the age of the longest lasting 
record grows like the inverse survival probability oc and the age of the shortest 
approaches a constant value proportional to 1 — a M (c). 

IV — The Brownian case with positive drift (fi = 2 & c > 0): This is the regime of 
a Brownian random walk with a jump distribution that has a finite variance. Here, 
a positive drift has a strong effect on the survival probability and the mean record 
number. In fact, in contrast to regime III, the survival probability decays exponentially. 
Majumdar et al. [23] showed that 

q c [0,n) oc — e 2^, (79) 

72.2 

where a is the standard deviation of the jump distribution (<r 2 = J dx x 2 f (x)) . Despite 
of this systematic difference between the survival probabilities of regime III and IV, the 
mean record number has the same behavior. As in regime III, we have (R n ) ~ ct2 (c) n 
(with a distribution dependent prefactor a 2 (c) = lim M ^ 2 (c)) and an asymptotically 
constant record rate P n = a 2 (c) > 0. Again in contrast to regime III, the distribution 
of the record number is Gaussian with mean value (R n ) and a standard deviation, which 
grows proportional to y/n. The age of the longest lasting record grows logarithmically 
oc In n, while the age of the shortest approaches a constant value as in regime III. 

V — The supercritical case with negative drift (fi > 1 & c < 0): The regime with /i > 1 
and negative drift is characterized by an asymptotically constant survival probability 
as well as a finite record number. Here, the drift eventually dominates the behavior 
and, beyond a certain time, records will no longer be possible. Majumdar et al. [23] 
computed the asymptotic survival probability q c (0, n) « (c) and showed that the 
asymptotically finite mean record number is given by the inverse of this value: 

(Rn) « 7^ x ~ 7~~V ( 80 ) 

q c (0, n) a M (c) 

Here, the parameter (c) for c < is related to the parameter a M (c) for c > from 
the regimes III and IV. One finds that a M (c) = a M (|c|). The distribution of the record 
number for n — > 00 has a simple geometric form: 

P (R n \n) w a M (c) (1 - a M (c))^" 1 . (81) 

Due to the fact that the record number is finite, the ages of the shortest and longest 
lasting records grow linearly with n in regime V. 

In 2011, Wergen et al. [22J also considered the Brownian case (regime IV), but 
focused on the behavior of the record statistics for finite n in the regime of a small 
drift c. Wergen et al. showed that, in this case, the survival probability and the record 
number of any biased random walk with a jump distribution that has a finite variance 
a 2 (fi = 2) is very similar to the corresponding quantities of a Gaussian random walk 
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with the same a 2 . Expanding the survival probability q c (0,n) from Eq. [70] up to first 
order in c, one finds that, for C\Jn <C a, 

* (0 ' n)S3 7s + i? (82) 



With the methods described in section 15.11 this result can be used to obtain the mean 
record number and the record rate in the regime of Cy/n <C a: 

(R n ) fa \[^- + — — - (narctan (y/n)) , (83) 

V 7T a TT v \ // 

P n ^ — L= + -— arctan (\fn) (84) 

V 7T71 (J 71" 



For n> 1, the arctan-function approaches |, which leads to (R n ) fa + and 

p ^ 1 I c 



/ TV II 



5.3. Multiple random walks 

Another way to generalize the fundamental work of Majumdar and Ziff [4T] is to consider 
ensembles of multiple random walks. In 2012, Wergen et al. [40] discussed the record 
statistics of the maximum X max>n (A) of iV uncorrelated DTRW's with a symmetric 
jump distribution. For a iV random walks with entries 

X iiU = X ii7l _i + r] i>n (i = l,...,N and X i>0 = 0) (85) 

and jumps r]^ n drawn from a single symmetric jump distribution / (rj). The maximum 
of random walks X max ^ n (N) is defined as 

X max , n (N) = max{Xi >n , X 2 , n , X NiU }. (86) 

Fig. [6] illustrates the record process of X maXjn (N) for N = 4 independent random walks. 

Unfortunately, since the maximum of N random walks does not exhibit the same 
renewal property as the single random walk, it is impossible to compute the record 
statistics from the survival probability q (0, n) and Sparre Andersen's theorem [661 EZ] 
is not useful here. 

Because of that, it is not possible to compute the probability P(R n (N),n) for 
R n (N) records of the maximum A maXj „ (A) of the A random walks directly. However, 
Wergen et al. [40] found a more general way to calculate the record rate P n (A) that 
also works in absence of the renewal property. 

Wergen et al. [40] argued that the probability that the maximum of A random 
walks sets a new record with record value x at time n, is given by A times the first- 
passage probability (x, n) multiplied with the survival probability q (x, n) to the power 
A — 1. This is because, A0 (x, n) q (x, n)^ -1 is just the probability that the value x is 
first exceeded by one of the A walkers in step n, while the other A — 1 stay below x. 
Integration over all possible record values x leads to 

POO 

P n (N) = N dx <f) (x, n) q (x, n) N ~ l . (87) 
Jo 
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Figure 6. Sketch of the record process of the maximum X max<n (N) of TV = 4 
independent (Gaussian) random walks. The progression of the upper record of 
X maXi „ (iV) is indicated by the red line. 



Therefore, to compute the record rate, we need the more general survival and first- 
passage probabilities q (x, n) and (ft (x, n) for an arbitrary x > 0. 

Wergen et al. [ID] computed the asymptotic behavior of these quantities for n — > oo 
using a highly non-trivial theorem due to Ivanov [69], which is, in some sense, a very 
general form of Sparre Andersen theorem. To keep this essay readable, we will not 
describe the calculations of Wergen et al. and only summarize their results: The large 
n behavior of q (x, n) and 4> (x, n) depends again on the Levy-index /i (see section I5.2I 
and Eq. ITTl) and one finds two universal regimes: 



/ — The Brownian case with finite a 2 (fi = 2): Here, for n — > oo, the first passage 
and survival probabilities approach the following forms: 



X 3? 

(x,n) « — -^-e~^ 



and q (x, n) w erf ( , | . (88) 



With these results one can compute the record rate directly. For a large number of 
random walks N ^> 1 one finds: 



n— >oo V -/V 

Apparently, for n, N ^> 1, the record rate of N random walks is given by the record 
rate P n (N = 1) of the single random walk times vnhiN. Similarly, the mean record 
number of iV ^> 1 random walks approaches (R n (N)) ~ VAn In N ~ Vvrln N (R n (1)). 



i7 — Levy flights with divergent a 2 (\i < 2): In this regime, it is not possible to 
compute the exact asymptotic behavior of q(x,n) and <f)(x,n). However, Wergen et 
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al. jlO] could extract the scaling behavior of the product (x, n) q (x, n) 1 in Eq. [571 
and found that 

P n (N)^^= and { R n ( N ))^^. (90) 



In this case, these quantities become completely independent from N and are exactly 
twice as large as the corresponding results for N = 1. The emergence of the prefactor 2 
and the complete independence of N are very interesting findings, which are not entirely 
understood by now. 



Wergen et al. [ID] also considered the distribution of the record number 
P (R n (N) , n) in these two cases. As discussed before, because of the lacking renewal 
property, it is not possible to compute this distribution directly. However, in case 
I with p — 2, it is possible to conjecture the asymptotic distribution P(R n (N),n) 
from the corresponding distribution of ensembles of random walks with a discrete jump 
distribution f (x) = h(5(x + l) — 6(x— 1)). For such an ensemble of lattice random 
walks, the record number is simply given by the maximum M n (N) of the process. Since 
the maximum of a single lattice random walk has a finite variance, the maximum value 
of N random walkers must be distributed according to a Gumbel distribution in the limit 
of large n and N. In fact, one finds that the mean value of the maximum (M n (N)) 
of TV random walkers converges to (M n (N)) ~ y 2n In N. The mean record number of 
the N random walkers with a continuous jump distribution only differs by a prefactor 
of \/2. Using this analogy one can infer that the record number in the continuous case 
has the following Gumbel distribution: 

P (Rn (N) \n) ps e e , with z = — -= . (91) 

y/nhiN 

This conjecture was confirmed numerically in jlO]. For Levy flights with a heavy-tailed 
jump distribution (/x < 2) it was not possible to compute P (R n (N) \n) by similar 
means. However, performing numerical simulations, Wergen et al. [10] found that this 
distribution seems to be entirely independent from the Levy-index fi £ [0, 2) and the 
number of walkers N ^> 1. Because of this universality, it would be very interesting to 
find an analytical expression for this hitherto unknown distribution. 



5.4- Continuous-time random walks 

As another natural generalization of the symmetric DTRW studied by Majumdar 
and Ziff [?T| (section 15.11) . Sabhapandit [13] studied continuous-time random walks 
(CTRW's). A CTRW is a process with entries X (to) , X (t±) , X (t n ) that are recorded 
at random times to < ti < ■■■ < t n seperated by random waiting-times := U — ti-i 
sampled from a (continuous) waiting time distribution p(t) (see Fig. [7J. In this 
context, a simple discrete-time random walk can be seen as a process with entries 
X (0) , X (1) , X (n) at fixed times to — 0, t\ — 1, t n — n with a degenerate waiting- 
time distribution p (t) = 5 (t — 1). 
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Figure 7. Sketch of the record process of a continuous time random walk with 
random waiting times between the jumps (Here, we sampled the waiting times from 
an exponential distribution). The progression of the upper (lower) record is indicated 
by the red (blue dotted) line. 



In the general case, the number of entries in a CTRW is apparently random, this 
makes it more complicated to consider the statistics of records. The number of records 
R (t) at an arbitrary, continuous time t is defined as 

R (t) := max{R n \t n < t}, (92) 

Sabhapandit discussed the record statistics of CTRW's in the limit of t — > oo. He 
found, that the asymptotic behavior depends on the tail of the waiting-time distribution 
p (t) and therefore introduced the Laplace transform p (s) of p (r) : 



POO 

/ drp(r)e- ST . 
Jo 



(93) 



The behavior of the waiting-time distribution for large values of t is encapsulated in the 
small s — > behavior of is Laplace transform. In general p (s) can be expanded in powers 
of s and, for small s, one finds p (s) ~ 1 — (fs) a with parameters f and a depending 
on the tail of the distribution p(r). For waiting-time distributions with a finite mean 
value (in this case f) one finds a = 1, a heavy-tailed waiting-time distribution without 
a first moment yields an a between and 1. 

Sabhapandit showed that the first-passage probability of the CTRW can be 
obtained from the existing result for the discrete time random walk and computed 
the asymptotic record statistics for the two cases of a = 1 and < a < 1. 

In first case, for a finite mean waiting time f, the asymptotic record statistics of the 
CTRW is the same as in the time-discrete case. Here, for tjf — > oo, the record number 
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R (t) is distributed according to the half-Gaussian distribution 



which, for f = 1, is exactly the record number distribution found in the discrete case. 
Similarly, for a = 1, the mean record number and the record rate are given by 

<fl(f)>« -%(IY ^d P(t)«_l(|) 2 ■ ( 95 ) 



In the case of a divergent mean waiting time with a < 1, the record number R (t) 
approaches a different asymptotic distribution: 



p (r (t) \t) ^l[l) 2 R (0 1_s h ( (Tl 1r (*)~ s ) > ( 96 ) 

where (z) is the pdf of a one-sided Levy-stable distribution, which, in general, can not 

be expressed analytically. The Laplace transform of L« (z) is given by L« (s) = e _s T . 
For a < 1, the mean record number and the record rate grow more slowly with n. 
Sabhapandit [43] found that, for an arbitrary a G (0,1], 

(i? (t)) « — ^ ( -) 2 and P (t) w f - ) , (97) 

s u/ «r [f] W 1 ; fr [f] \f) 

in good agreement with his findings for a = 1 in Eq. 



5.5. Records in higher- dimensional processes 

In 2011, Edery et al. jH], considered DTRW's in two and three dimensions and discussed 
the record statistics of the distance of such a process from the origin. In the case of a 
one-dimensional random walk this distance at the time n is just given by \X n \ = \fX^- 
Already in this case it was not yet possible to compute the exact record rate and the 
distribution of the record number analytically. 

Edery et al. were interested in DTRW's on an orthogonal lattice in two and three 
dimensions. At each time step, such a random walker jumps from one lattice site to an 
adjecent site in a random direction. They analyzed the number of records in the series 
of distances |Ao| ; \X n \ from the origin using numerical simulations. 

Edery et al. began with a discussion of a symmetric lattice random walk with a 
symmetric distribution of the jumps. In this case, they could demonstrate that the 
mean record number of this process has the same scaling behavior as in the case of the 
discrete-time random walk in one dimension. Without a bias the mean record number 
grows proportional to y/n. 

In the case of a biased lattice random walk, with a drift in an arbitrary direction, 
the asymptotic behavior changes. In all three considered dimensions, the asymptotic 
mean record numbers grows linearly in n. 
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6. Applications 

6.1. Climate records 

The most popular application of the theory of records in the last years was certainly the 
study of temperature records. The evident and most likely man-made increase of the 
global mean temperature over the last decades [70] raised the question about the effects 
of this climatic change on the occurrence and the magnitude of extreme and record- 
breaking events [ZH [721 [731 SI EH CHI ESI [771 [78] . While, it is intuitively clear to assume 
that a warming climate also leads to more heat and less cold records, the first systematic 
application of theoretical results from record statistics was presented by Benestad in 
2003 and 2004 [21 [6] . He compared the record process of monthly temperature mean 
values from Scandinavian weathers stations with i.i.d. RV's and found a small, but 
significant increase in the number of heat records. Interestingly, he also considered 
daily precipitation sums, where he could not determine any non- stationary behavior of 
the record rate p3, [79] . 

In 2006, Redner and Peterson considered daily temperature measurements from 
a single weather station in Philadelphia. Even though their data set covered more 
than 100 years, they had difficulties to quantify the effect of global warming on the 
measurements from this station. Nevertheless, Redner and Peterson made important 
progress on the matter. In fact, they proposed a simple model of a Gaussian distribution 
with a linear drift to describe the record statistics of temperature measurements for 
individual calendar days. Within this model, a daily (mean, minimum or maximum) 
temperature T n in the nth year of an observation period is sampled from a Gaussian 
distribution with an increasing mean value uo + ct. Here, c is the drift, which is basically 
the speed of warming. Then, the probability density of the daily temperatures should 
be of the form 



where a is the standard deviation that describes the fluctuations of the measurement 
around the moving mean value. Apparently, this is exactly the Linear Drift Model 
(LDM) we discussed in section [3] for Gaussian RV's. 

The work of Redner and Peterson motivated many others to study both, the 
statistics of record breaking temperatures [El El QUI EH [HI EEj and also the simple 
LDM [3H [321 [33]. In 2009, Meehl et al. [8] analyzed a large number of U.S. weather 
stations with respect to the occurrence of heat and cold records and found a significant 
effect of global warming in the ratio of the two of them. In 2010, Wergen and Krug [9] 
confirmed these findings in an independent study of European station and re-analysis 
data [581 18T] - The work of Newman et al. [10], Anderson and Kostinski [80], Elguindi 
et al. [11] as well as Rahmstorf and Coumou [12] lead to similar results. 

The main subject of these studies was a comparison between time series of 
temperature measurements for individual calendar days or months and uncorrelated 
RV's from the LDM or related, slightly more complicated models. In fact, it is by now 




(98) 
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well established that a Gaussian LDM, despite its simplicity, can describe the effect 
of global warming on the occurrence of daily temperature records relatively accurate. 
From 1960 to 2010, the global mean temperature increased in a roughly linear manner 
[70| H]. This effect is also found in measurement from European and U.S. weather 
stations and re-analysis data. In the same time, the standard deviation of the daily and 
monthly temperatures around their moving mean value remained more or less constant 
[70| HI |9] . Since the magnitude of the warming in recent years is still much smaller than 
the average standard deviation of daily temperatures, one can compare the temperature 
measurements with a Gaussian LDM in the regime of small c <C a / \fn (see section [3]). 
Here, with the findings of Franke et al. [33], the record rate P n (c) in a time series with 
a linear drift (warming) c and standard deviation a is given by: 



Apparently, in this approximation, the important degree of freedom is the normalized 
drift c := -. For a fixed n, a large increase in the record rate can be caused by a large 
positive drift c, or a small standard deviation a. 

In most of the considered daily data sets, the normalized drift between 1960 and 
2010 was of the order c ~ 0.01 y _1 . For this value, Eq. [99] predicts an increase in the 
record rate of about 27% after 30 years and of more than 50% after 50 years of warming. 
These predictions are in good agreement with the data found in the literature. For 
instance, Wergen et al. [9] considered daily maximum temperatures measured at 202 
European stations [81] between 1975 and 2005 and found an increase of around 40% in 
the number of heat records along with a normalized drift of c ~ 0.015 y . 

For monthly mean temperatures the normalized drift is usually larger, since the 
standard deviation of these averaged values is much smaller than in the case of daily 
measurements. Therefore, the rate of monthly upper records can be many times as high 
as expected in the case of a stationary climate [801 E2]. Of course, for annual mean 
values this effect is even stronger and, over the last decade, the rate of new global mean 
temperature records was increased by around 2800% [76] . 

As shown by Wergen et al. [9] and recently, in more detail, also by Elguindi et 
al. pT], the normalized drift c has strong regional variations. In Europe, c seems to be 
generally larger than in the U.S., where c for the daily data is usually much smaller than 
0.01 y _1 [9]. Due to the high heat capacity of water, the standard deviation of the daily 
measurements is much smaller near or over the oceans. Because of that, time series of 
these measurements can have a very large c and therefore a very strong effect of global 
warming on the record rate. Stations far away from the sea, on the other hand, can 
have a very high standard deviation, which reduces the effect of the drift on the record 
rate. This explains, why the increase of the record rate in Europe was much stronger 
than in the U.S. [9]: The temperature fluctuations at the U.S. stations, in particular of 
those in the middle of the continent, are much larger than at the European stations and 
therefore the normalized drift c of the U.S. stations is much smaller. 




(99) 
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In a recent study, Wergen et al. [E2] discuss the statistics of the values of record 
breaking temperatures. While the preceding literature focused only on the occurrence 
of temperature records, it is also possible to describe the effect of global warming on the 
values of these records using the LDM. However, here, a Gaussian LDM is appropriate 
only in the summer months, where the values of heat records are significantly increased 
by the warming. In the winter, especially in cold, sub-polar regions, the distribution 
of daily temperatures is highly asymmetric and favors extremely cold temperatures in 
comparison with the Gaussian case. Therefore, despite of global warming, cold records 
are still much further away from the temperature mean value than heat records [82J. 
In other words, despite of global warming, we can still expect extreme, record-breaking 
cold days in winter. 

6.2. Records in finance 

While, in the study of temperature records, the observational data was compared to 
uncorrelated random variables, the financial markets yield good example for highly 
correlated processes. A simple way to model a stock, which is tradable at a stock market, 
is the Geometric Random Walk model (GRM), which was, in a slightly different form, 
already proposed by Le Bachelier in 1900 [83]. The GRM describes the logarithms of 
stocks prizes with a simple biased random walk (as in section 15. 2D . In recent work 
Wergen et al. (22J H2], as we ^ as Bogner [M], discussed the record statistics of daily 
stock data from the U.S. Standard and Poors 500 (S&P 500) stock index [85] in the 
context of this model. 

They considered the logarithms of time series of daily stock prizes So, Si, S n . 
Within the GRM, these logarithmic prizes lnS n should behave like a biased random 
walk with 

In ^ = In + r/i + c (100) 

with random jumps (daily returns) r\i from a symmetric (return-) distribution / (77) 
and a bias c. The bias represents a systematic, long-term growth in the system 
and leads, asymptotically, to an exponential growth of S n . Since the logarithm is 
monotonic, a record in a series of stock prizes S , Si, S n is also a record in the series 
In So, In Si, In S n and one can use the results for the record statistics of biased random 
walks to model the records of the stock prizes. 

As it turns out, the GRM is, to a certain degree, useful to predict the record number 
of daily stock prizes from the S&P 500. Wergen et al. [22] considered series of daily 
stock prizes of 366 stocks from this index and analyzed the progression of the mean 
record number (R n ) of these stocks. 

To compare with the analytical findings for the biased random walk, they 
determined the drift c and the standard deviation a of the jump distribution of the 
daily returns from the stock data. In the relevant regime of Cyfri <C a, they predicted a 
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mean record number of (see section 15.21) 




(101) 



Again, as in the case of temperature records in a warming climate, the relevant 
parameter that describes the effects of the bias is the normalized drift c — -. For the 
S&P 500 stock data one finds a value of c between 0.018 d~ l and 0.025 d~ l depending 
on the length of the observation period. 

Wergen et al. J32j [86] found that Eq. 11011 predicts the qualitative behavior of the 
mean record number of the stock prizes to some accuracy. Even though it slightly 
overestimates both the number of upper and the number of lower records, the difference 
between the two is modeled correctly. 

In a similar study, Wergen et al. [10] considered ensembles of multiple stocks from 
the S&P 500 and compared them with their analytical results for multiple independent 
random walkers (section I5T31) . They rescaled and detrended the daily stock data to make 
them comparable with symmetric random walks with a jump distribution of standard 
deviation unity. Ensembles of N of these rescaled stocks were then compared with the 
analytical result for the mean record number of the maximum of N independent random 
walks (see section l5~3l) : 



Interestingly, one finds that the maximum of N detrended and rescaled stocks grows 
also proportional to VnlriN, but has a different prefactor smaller than the one of the 
independent random walks. In jlQ] this was tentatively interpreted with a smaller, 
effective number of stocks that are stochastically independent in the context of record 
statistics. In view of the ongoing research on the important role of correlations between 
stocks in financial markets, it would be interesting to better understand the meaning of 
the effective number in the future. 

6.3. Physics and biology 

Interestingly, in physics and also in evolutionary biology, one finds several complex 
dynamical systems that behave like the record process of i.i.d. RV's. In particular, 
some diffusive processes in random environments, like the random energy landscape by 
Derrida [87] . can be described using record statistics. These systems are usually stable 
on short time-scales, but run through intermittent events, so-called quakes, which bring 
them from one stable state to another. As it turns out these quakes can be modeled as 
record events in time series of i.i.d. RV's. 

An important feature of the record process of i.i.d. RV's is that it can be described as 
a Poisson process in logarithmic time. According to Sibani and Littlewood [88] (see also 
the reviews by Jensen [13] and Anderson et al. [18]), the distribution of the logarithmic 
waiting times := lnt^ — lntfc_i between the (k — l)st and the kth record is given by 
the exponential distribution with the pdf p (A) = e _A . With this one can show that 



(i^Tv) ~ 2Vn\nN. 



(102) 



Records in stochastic processes - Theory and applications 



35 



the probability P k (t) of having k 3> 1 records up to time t 3> k is given by 




(103) 



This kind of log-Poisson statistics is also found in various dynamical systems like, for 
instance, the Edwards Anderson spin-glass model [HJ [151 03] • The time-evolution of 
the (local) energy E (t) of such a spin-glass, which relaxes towards a lower energy state 
after an initial quench, is characterized by a series of local energy minima E m i n (k) and 
maxima E max (k). In order to get from one stable state with energy E min (k) to the 
next with energy E m \ n (k + 1), the system needs to overcome an energy barrier with 
AE k = E max (k) — E m i n (k). Therefore, the noise driven system needs a fluctuation 
of the size AEk to relax to the next stable state. Now, as shown by Sibani et al. 
[HI [151 [13] , these energy barriers AEk are usually monotonically increasing in k and 
one finds AEk < AEk+i- Because of that, the fluctuation necessary to overcome AE^i 
has to be (slightly) larger than the one needed for AEk- In other words, it requires a 
record-breaking event in the series of fluctuations for the system to relax further. Since 
these fluctuations are usually assumed to be i.i.d. RV's, one can assume that the process 
of jumps (quakes) from one stable state to another has the same time-evolution as the 
record process of i.i.d. RV's, which is describe by Eq. 11031 

A similar behavior is found in the so-called Restricted Occupancy Model, which was 
proposed in the context of the theory of type-II superconductors [HI [16] . This model 
describes the gradual magnetization of a superconducting sample through an external 
magnetic field. In this context, the number of flux vortices inside a three dimensional 
model of the type-II superconductor increases step-wise and monotonically in time. The 
occurrence of the steps (quakes) in the vortex number also exhibits the same log-Poisson 
statistics as the record process of i.i.d. RV's. 

As it turns out, the record process of i.i.d. RV's can be used to describe various 
dynamical models related to the random energy model. In the context of evolutionary 
biology, several authors studied the connection between record statistics and adaptation 
on the fitness landscapes of genotypes. Such a landscape maps the fitness associated 
with a certain genotype to a (high-dimensional) cubic lattice similar to a lattice of spins. 
Kaufmann and Levin [17], Sibani et al. [89], as well as Krug and Jain [19] discussed 
mutations on random fitness landscapes with i.i.d. fitness values and compared the rate 
of their occurrence with the record rate of i.i.d. RV's. The main idea is that, in order to 
survive and take over a population, a mutant with random fitness has to be fitter than 
all previous mutants in an evolutionary process. Therefore he must be a mutant with 
record-breaking fitness. 

A detailed discussion of the applications of record statistics in evolutionary biology 
can, for instance, be found in [90J. 
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6.4- Athletic records 

Even though, the occurrence of records in sports, like, for instance, in athletics or in 
swimming, receives an enormous amount of public attention, only very few have studied 
the statistical properties of these sport records so far. In the context of the ongoing 
controversy about the role of legal and illegal doping on the performance of athletes, 
the theory of records provides a method to distinguish between statistical fluctuations 
and real improvements. Because of the universal features of the record statistics of 
i.i.d. RV's, one can analyze the occurrence of records in time series of sports results 
without detailed knowledge about the underlying distribution from which the results 
are sampled. In principle, if the number of records, in a series of sports results, is 
significantly larger (or smaller) than in an i.i.d. series of comparable length, this can 
not be in agreement with a constant performance level of the athletes. However, when 
analyzing historical data, it is hard to determine the total number of attempts in a 
certain event and it is therefore difficult to determine the record rate P n . Usually only 
a small number of very good performances is recorded on the leaderboard and one can 
only analyze the statistics of their values. 

By now, the only systematic analysis of athletic records was published by Gembris 
et al. [21 [3]. They considered the evolution of the record values of several track and field 
events and compared them to theoretical results for the maximal values of Gaussian 
i.i.d. RV's. They estimated the mean value and the standard deviation of the athletic 
performances for the time series of individual events. Then they compared the record 
events in the athletic data with series of Gaussian RV's with the same parameters. A 
comparison of the record values allows to identify events, where the athletes improved 
significantly over the duration of the time series. It turns out that only in some 
events the record values actually improve faster than expected on the basis of constant 
athletic capabilities. In 50% to 80% of all considered track and field events, Gembris et 
al. [21 [3] could not disprove their null hypothesis of a stationary distribution of athletic 
performances. Interestingly, in the cases where they could detect a systematic time- 
dependence, the increase in the performance seemed to be far from linear in time. If 
the performances of athletes would improve due to better training, nutrition, or just a 
growing population, one would expect a continuous effect on the record rate. Instead, 
the progressions of some athletic record values, especially in long distance running and 
in throwing, are characterized by large jumps, which are probably best explained by 
instantaneous effects like the introduction, or the prohibition of certain drugs. In fact, 
while the record values of several long distance running events, like 5000m or 10000m 
improved drastically after the introduction of blood doping with erythropoietin, almost 
no records were set in the throwing events since anabolic drugs became detectable in 
urine samples [21 [3] . 

Another interesting problem in this context, is the question about an absolute limit 
to world records in sports, like, for instance, a hypothetical speed that can not be 
exceeded by humans, which would lead to a lower boundary for the possible outcome 
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of a 100m dash race. Up to now, several different methods were applied to find such 
a boundary, but the issue is still controversial [9H [921 [931 US]- It might be possible 
to answer this question using extreme-value or record statistics, since these exhibit 
different universal properties for distributions with a bounded and an infinite support. 
A first step towards this goal was done by Einmahl and Magnus in 2008 [95]. They 
estimated the tail behavior of the distributions of performances in track and field events 
by comparing them to a generalized Pareto distribution. In most considered cases, they 
could find an absolute limit to the world records. It would interesting to confirm and 
improve their findings in future studies. 

7. Summary and outlook 

In this review, we tried to summarize numerous interesting and non-trivial results on the 
statistics of records, which were discovered in the last couple of decades. Especially in the 
last 10 years, the study of record-breaking events has become a broad and diverse field 
of research. Additionally, the occurrence and the properties of records were analyzed 
and discussed in a vast number observational data sets. Researchers have understood 
that records are often more than just interesting to the observer: One can learn a lot 
about the properties of a complex dynamical system by considering the record events it 
generates. 

In this context, there a many open problems, which might suit as subjects for 
future research. The research on the record statistics of uncorrelated random variables 
with time-dependent distributions has just began and only the very simple cases of a 
constant linear drift and an increasing standard deviation have been understood to some 
degree. Of course, one can ask, how the record rate and also the full distribution of 
the record number is affected by a more complicated time dependence of the underlying 
distribution. For the Linear Drift Model and the Increasing Variance Model discussed 
in this review, it would also be interesting to compute the mean values of records that 
have a certain record number, or occur at a certain time. 

Furthermore, the record statistics of correlated random variables are only 
understood for a few special cases. Especially in the context of possible applications 
in finance, it would be very interesting to calculate the record rate of more complicate 
non-Markovian processes. Some interesting candidates for future research are branching 
random walks, the absolute value of a random walk and the Ornstein-Uhlenbeck process, 
which is particularly important for the modeling of financial data [96J. In general, it 
would be desirable to better understand the effect of long-term, or power-law type 
correlations on the record statistics of stochastic processes. 

With respect to the various applications of the theory one easily finds numerous 
interesting open question. For instance in climatology, it is hardly understood how 
specific weather conditions affect the occurrence of records. Here, it is also still unclear if 
one can find a significant effect of climatic change on the record statistics of precipitation 
events or, for instance, also record-breaking storms. In finance, our understanding of the 
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statistics of record-breaking stock prizes does still not explain some interesting deviations 
from the classical model of a geometric random walk. It is a challenging problem to find 
a more accurate description for this record process. 
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